Using Sociolinguistic Inspired Features for Gender Classification of Web Authors
نویسندگان
چکیده
In this article we present a methodology for classification of text from web authors, using sociolinguistic inspired text features. The proposed methodology uses a baseline text mining based feature set, which is combined with text features that quantify results from theoretical and sociolinguistic studies. Two combination approaches were evaluated and the evaluation results indicated a significant improvement in both combination cases. For the best performing combination approach the accuracy was 84.36%, in terms of percentage of correctly classified web posts.
منابع مشابه
Evaluation and Sociolinguistic Analysis of Text Features for Gender and Age Identification
Corresponding Author: Vasiliki Simaki Centre for Languages and Literature, Lund University, Lund, Sweden Email: [email protected] Abstract: The paper presents an interdisciplinary study in the field of automatic gender and age identification, under the scope of sociolinguistic knowledge on gendered and age linguistic choices that social media users make. The authors investigated and...
متن کاملComputational analysis to explore authors' depiction of characters
This study involves automatically identifying the sociolinguistic characteristics of fictional characters in plays by analyzing their written “speech”. We discuss three binary classification problems: predicting the characters’ gender (male vs. female), age (young vs. old), and socioeconomic standing (upper-middle class vs. lower class). The text corpus used is an annotated collection of August...
متن کاملA Document Weighted Approach for Gender and Age Prediction Based on Term Weight Measure
Author profiling is a text classification technique, which is used to predict the profiles of unknown text by analyzing their writing styles. Author profiles are the characteristics of the authors like gender, age, nativity language, country and educational background. The existing approaches for Author Profiling suffered from problems like high dimensionality of features and fail to capture th...
متن کاملA Novel Approach to Feature Selection Using PageRank algorithm for Web Page Classification
In this paper, a novel filter-based approach is proposed using the PageRank algorithm to select the optimal subset of features as well as to compute their weights for web page classification. To evaluate the proposed approach multiple experiments are performed using accuracy score as the main criterion on four different datasets, namely WebKB, Reuters-R8, Reuters-R52, and 20NewsGroups. By analy...
متن کاملAuthor gender identification from text using Bayesian Random Forest
Nowadays high usage of users from virtual environments and their connection via social networks like Facebook, Instagram, and Twitter shows the necessity of finding out shared subjects in this environment more than before. There are several applications that benefit from reliable methods for inferring age and gender of users in social media. Such applications exist across a wide area of fields,...
متن کامل